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Cn , Abstract 






In the context of designing a scalable overlay network to support decentralized 
topic-based pub/sub communication, the Minimum Topic- Connected Overlay 
problem (Min-TCO in short) has been investigated: Given a set of i topics and 
a collection of n users together with the lists of topics they are interested in, the 
Y^ ■ aim is to connect these users to a network by a minimum number of edges such 

that every graph induced by users interested in a common topic is connected. 
It is known that Min-TCO is A/'T'-hard and approximable within 0{\ogt) in 
, ^ polynomial time. 

J^ ■ In this paper, we further investigate the problem and some of its special 

^^ I instances. We give various hardness results for instances where the number of 

t:j- ■ topics in which an user is interested in is bounded by a constant, and also for the 

O^ ' instances where the number of users interested in a common topic is constant. 

r~^ . For the latter case, we present a first constant approximation algorithm. We 

^^ ' also present some polynomial-time algorithms for very restricted instances of 

Min-TCO. 
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H ! 1. Introduction 



Recently, the spreading of social networks and other services based on shar- 
ing content allowed the development of many-to-many communication, often 
supported by these services. Publishers publish information through a logical 
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channel that is consumed by subscribed users. This environment is often mod- 
eled by publish/subscribe (pub/sub) systems that can be classified into two 
categories. When the channels are associated with a collection of attributes 
and the messages are delivered to a subscriber only if their attributes match 
user-defined constraints, we speak about content-based pub/sub systems. Each 
channel in topic-based pub/sub systems is associated with a single topic and 
the messages are distributed to the users via channels by his/her topic se- 
lection. There are numerous implementations of pub/sub systems, for details 



In our paper, we focus on topic-based peer-to-peer pub/sub systems. In 
such a system, subscribers interested in a particular topic have to be connected 
without the use of intermediate agents (such as servers) . Many aspects of such 
a system can be studied (see [9|, l20j). Minimizing the diameter of the overlay 
network can minimize the overall time in which a message is distributed to 
all the subscribers. When minimizing the (average) degree of nodes in the 
network, the subscribers need to maintain a smaller number of connections. In 
this paper, we study the minimization of the overall number of connections in the 
system. A small number of connections may be necessary due to maintenance 
requirements or may be helpful since thus information aggregated into a single 
message can be broadcasted to the network and thus amortize the head count 
of otherwise small messages. 

We study here the hardness of Minimum Topic- Connected Overlay (Min- 
TCO) which was studied in different scenarios in [3, |9|, llll, ll8| . In Min-TCO, we 
are given a collection of users, a set of topics, and a user-interest assignment, 
we want to connect users in an overlay network G such that all users inter- 
ested in a common topic are connected and the overall number of edges in G 
is minimal. The hardness of the problem was studied in [9[ and [3 . In [9| , the 
inapproximability by a constant wasproved and a logarithmic-factor approxi- 
mation algorithm was presented. In [2], the lower bound on the approximability 
of Min-TCO was improved to r2(log(n)), where n is the number of users. 

Moreover, we focus here on special instances of Min-TCO. We study the case 
where, for each topic, there is a constant number of users interested in it. We 
also consider the case where the number of topics in which any user is interested 
is bounded by a constant. We believe that such restrictions on the instances 
have wide practical applications such as when a publisher has a limited number 
of slots for users or the user's application limits the number of topics that he/she 
can follow. 

In the study of the general Min-TCO, we extend the method presented in [9| 
and design an approximation-preserving reduction from instances of the min- 
imum hitting set problem to instances of Min-TCO. This reduction does not 
only prove a similar lower bound as in ^, but also shows that Min-TCO is 
COGAP X-connplete and thus, concerning approximability, equivalent with such 
a famous problem as the minimum set cover. As our reduction is not blow- 
ing up the number of users interested in a common topic, the reduction is 
also an approximation-preserving reduction for the case where the number of 
users interested in a common topic is limited to a constant. Furthermore, we 



design a one-to-one reduction of these instances to special instances of the hit- 
ting set problem. As these special instances of the hitting set problem are 
constantly approximable, we immediately obtain the first approximation algo- 
rithm for our special instances. This, together with our approximation preserv- 
ing reduction, shows that the restriction of Min-TCO to such special instances 
is ^T'A'-complete. Finally, due to the one-to-one reduction and the proper- 
ties of the special instances of hitting set problem, we show the existence of a 
polynomial-size kernel and a non-trivial exact algorithm, all for the instances of 
Min-TCO where the number of users interested in a common topic is bounded 
by a constant. 

For the case, where the number of topics of Min-TCO is bounded from above 
by (1 + e(n))-i • log log n, for e(n) > logiogCl^j^/'aTogTogiogn (" is the number 
of users), we present a polynomial-time algorithm that computes the optimal 
solution. 

In the study of instances where the number of topics any user is interested in 
is restricted to a constant, we show that, if this number is at most 6, Min-TCO 
cannot be approximated within a factor of 694/693 in polynomial time, unless 
V — MV 1 even if any pair of two users is interested in at most three common 
topics. 

The paper is organized as follows. Section [2] is devoted to the preliminaries 
and a summary of known results. The hardness, approximation results, kernel- 
ization and an exact algorithm for instances of Min-TCO, where we limit the 
number of users interested in a common topic by a constant, are discussed in Sec- 
tion [31 This section also provides the discussion about £C'tX47'<%'-completeness 
of the general Min-TCO. The results related to the instances of Min-TCO, where 
the number of topics that each user is interested in is constant, are presented in 
SectionUl Section [5] contains a polynomial-time algorithm that solves Min-TCO 
when the number of topics is small. The conclusion is provided in Section [51 

2. Preliminaries 

In this section, we define basic notions used throughout the paper. We 
assume that the reader is familiar with notions of graph theory. Let G = {V, E) 
be an undirected graph, where V is the set of vertices and E is the set of 
edges. Let V{G) and E{G) denote the set of vertices and the set of edges of G, 
respectively. We denote by E[S] the set of edges of G in the subgraph induced 
by the vertices from S CV,i. e., ^^[5*] = {{u,v} ^ E \ u,v & S}. The graph 
induced by S* C T/ is denoted as G[S] = {S, -^[5*]). By N[v] we denote the dosed 
neighborhood of vertex v, i. e., N[v] = {u £ V \ {u, v} £ E}U {v}. A graph G is 
called connected, if, for any ui, u^ £ V, there exists a path (ui, U2, . . . ,vi) such 
that {ui,Ui+i} G E, for all 1 < i < £. 

Let X be an instance of an optimization problem (in this paper, Min-TCO, 
Min-VC or Min-HS), then by |a;| we denote the size of this instance, i. e., the 
number of vertices and topics of an instance of Min-TCO and the number of 
elements and sets of an instance of Min-HS. For a set S, \S\ denotes the size of 
the set, i. e., the number of its elements. 



The set of users or nodes of our network is denoted by U — {ui, U2, . . . , Wn}- 
The topics are T = {ti,t2, ■ ■ ■ , tm}- Each user subscribes to several topics. This 
relation is expressed by the user interest function INT : U —>■ 2^. The set of 
all vertices of U interested in a topic t is denoted by Ut ■ For instance, if user 
u G U is interested in topics ii, ^3 and t^, then we have INT(m) = {ii, is, ^4} and 
u e Uti, Ut^, Ut^. For a given set of users U, a set of topics T, and an interest 
function INT, we say that a graph G = (U, E) with E C {{u, v} \u,v &U hu ^^ 
v} is t-topic-connected, for t &T, if the subgraph G[Ut] is connected. We call the 
graph topic- connected if it is i-topic-connected for each topic t G T. Note that 
the topic-connectedness property implies that a message published for topic t 
is transmitted to all users interested in this topic without using non-interested 
users as intermediate nodes. 

The most general problem that we study in this paper is called Minimum 
Topic Connected Overlay: 

Problem 1. Min-TCO is the following optimization problem: 

Input: A set of users U, a set of topics T, and an user interest function INT : 
f/^2^. 

Feasible solutions: Any set of edges E C {{u, w} \ u,v E U A u ^ v} such 
that the graph ([/, E) is topic-connected. 

Costs: Size of E. 

Goal: Minimization. 

In this paper we study also some of its special instances. We restrict the 
number of users that are interested in a common topic, i. e., the size of Ut-, 
to a constant. We also study the instances where each user is interested in a 
constant number of topics. The definitions necessary for these special instances 
are summarized in the beginning of the corresponding section. 

We refer here to the famous minimum hitting set problem (Min-HS) and 
minimum set cover problem (Min-SC). In Min-HS, we are given a system of sets 
S — {Si, . . . , Sm} on n elements X = {xi, . . . , a;„} (i. e., Sj C X). A feasible 
solution of this problem is a set H C X, such that Sj D H ^ (d for all j. Our 
goal is to minimize the size of H. The Min-SC is the dual problem to Min-HS. 
In this problem, we are given a system of sets S — {Si, ... , Sm} on n elements 
X = {xi, . . . ,Xn}, a feasible solution is a set S C S oi sets such that for all i 
there exists j such that Xi E Sj £ S and the goal is the minimization of the size 
of S*. 

There are many modifications and subproblems of the hitting set problem 
that are intensively studied. In our paper, we refer to the d-HS problem - 
a restriction of Min-HS to instances where |5'il < d for all i. 

The Min-HS is equivalent to the Min-SC ([3]), thus all the properties of Min- 
SC carry over to Min-HS. Following from these properties, we have COQAVX- 
completeness of Min-HS ([10]) and ^T'A'-completeness of d-HS ([21|). There is 
a well known d-approximation algorithm for d-HS (5]), it can be approximated 



with ratio d — ~inra "" ( UM ) ' it is J\fV -haxd to approximate it within a factor 
{d — 1 — e) ([ill) and d-HS is not approximable within a factor better than d, 
unless the unique games conjecture fails ([15[)- 



We use the standard definitions from complexity theory (for details see [13|): 



• 



• 



For MVO problems in the class VTAS, there exists an algorithm that, 
for arbitrary e: > 0, produces a solution in time polynomial in the input 
size (but possibly exponential in 1/e that is within a factor (1 + e) from 
optimal. 

The AfVO problems in the class AVX are approximable by some constant- 
factor approximation algorithm in polynomial time. 

For AfVO problems in the class COQAVX , there exists a polynomial-time 
logarithmic-factor approximation algorithm. 



Thus 

VTAS c AVX c COgAVX. 

Definition 1. Let A and B be two AfVO minimization problems. Let I a and 
Ib be the sets of the instances of A and B, respectively. Let Sa{x) and Ssiy) be 
the sets of the feasible solutions and let costAix) and costsiy) be polynomially 
computable measures of the instances x € Ia and y € Ib, respectively. We say 
that A is AP-reducible to B, if there exist functions / and g and a constant 
a > such that: 

1. For any x & Ia and any e > 0, f{x, e) G Ib- 

2. For any x G Ia, for any e > 0, and any y e SB{f{x,e)), g{x,y,e) £ Sa{x). 

3. The functions / and g are computable in polynomial time with respect to 
the sizes of instances x and y, for any fixed e. 

4. The time complexity of computing / and g is nonincreasing with e for all 
fixed instances of size \x\ and \y\. 

5. For any x G Ia, for any e > 0, and for any y e SBifix, e)) 

< 1 + e implies 



Tain{costB{z) \ z € ^^(/(a;, e))} 
costA(g{x,y,e)) 
min{cosi^(z) \ z G Sa{x)} 



<l + a-£. 



3. Results for Min-TCO When The Number of Users Interested in a 
Common Topic is a Constant 

In this whole section, we denote by a triple (C/, T, INT) an instance of Min- 
TCO. We focus here on the case where the number of users that share a topic 
t, i. e., maxfgT \Uf\, is bounded. 

We present here a lower bound on the approximability, a constant approxi- 
mation algorithm and an ^T'A'-completeness proof for these restricted instances 
of Min-TCO. 



3.1. Hardness results 

It is easy to see that, if maxtgT \Ut\ < 2, then Min-TCO can be solved in 
hnear time, because two users sharing a topic t should be directly connected by 
an edge, which is the unique minimum solution. 

Theorem 1. If vaay^teT \Ut\ < 2, then Min-TCO can he solved in linear time. 

We extend the methods from [&] and design an AP-reduction from rf-HS to 
Min-TCO, where maxtgT \Ut\ < d+ 1. 

Theorem 2. For arbitrary d > 2, there exists an AP-reduction from d-HS to 
Min-TCO, where maxtgy \Ut\ < d + 1. 

Proof. Let /hs = i^^S) be an instance of d-HS and let e > be arbitrary. 
We omit the subscript in the functions costd-HS and costuin-TCO as they are 
unambiguous. For the instance Jhs, we create an instance /too — {U,T, INT) 
of Min-TCO withmaxtgT|C/t| < d-hl with |X| -Hfc users, where fc = |X|2.[i±£], 
as follows (the function / in the definition of AP-reduction) . 

U = XU{p,\p.,^X M<i<k}, 
T = {ts^ \ Sj (zSAl<i<k}, 

({tl \xGSiAS-ieSAl<i<k} for x G A 
INT(a;) = <^ ^ ^^ ' ^ ^ - - / 

[Ws, I Sj eS} for a;=p. 

Observe that the instance contains k ■ \S\ topics and its size is polynomial 
in the size of /hs ■ The users interested in a topic tg . {Sj G S) are exactly the 
elements that are members of set Sj in d-HS plus a special user pi (1 < J < k). 
Let 5*0^x00 be a feasible solution of Min-TCO on instance Itco- We partition 
the solution into levels. Level z is a set Li of the edges of S'oZtco that are incident 
with the special user pi . In addition, we denote by Lq the set of edges of SoItco 
that are not incident with any special user. Therefore, SoItco = Ui=o ^i ^^"^ 
Lj n Lj = (0 < i < j < k). 

We claim that, for any Li {1 < i < k), the set of the non-special users 
incident with edges of Li is a feasible solution of the instance /hs of d-HS. This 
is true since, if a set Sj G 5 is not hit, none of the edges {x,pi} (x G Sj) is in 
Li . But then the users interested in topic i^ . are not interconnected as user pi 
is disconnected. 

Let j be chosen such that Lj is the smallest of all sets L^, for 1 < i < fc. 
We construct Solus by picking all the non-special users that are incident to 
some edge from Lj (the function g in the definition of AP-reduction). Denote 
an optimal solution of d-HS and Min-TCO for /hs and /tco by Opt^is and 
OptTCO, respectively. 

If we knew Optns, we would be able to construct a feasible solution of Min- 
TCO on /tco as follows. First, we pick the edges {x,pi}, x G Optus, for all 
special users pi, and include them in the solution. This way, for any topic 



t G INT(pi), we connect pi to some element of X that is interested in i, too. 
To have a feasible solution, we could miss some edges between some elements 
of X. So, we pick all the edges between elements from X. The feasible solution 
of Min-TCO on /tco that we obtain has roughly cost 

k ■ cost{OptHs) + \Xf > cost{OptTco)- 

On the other hand, if we replace all levels Li {1 < i < k) by level Lj in 
SoItco, we still have a feasible solution of Min-TCO on /tco, with cost possibly 
smaller. Thus 

k ■ cost{Sol}is) < cost{SolTco)- 

We use these two inequalities to bound the cost of Solus'- 

k ■ costiSolns) < """^(;^"^TCo) . ^^ . costiOptns) + \X\') 



and thus 



cost{Soliis) ^ cost{SolTCo) /-. , l-'^l 



COSt{Optlis) cost{OptTco) \ k 

If cost{SolTco) / cost{OptTco) < 1 + e and a :— 2, then we have 

cost( Solus) , X / \X?-\ , . ( e \ 

^ ''''^ < (1 + e) • 1 + ^—^ < (1 + e) • 1 + = 1 + 2£. 



cost(OptHs) V ^ / V 1 

It is easy to see that the five conditions of Definition [1] are satisfied and thus 
we have an AP-reduction. □ 

Corollary 1. For any J > and polynomial-time a-approximation algorithm 
of Min-TCO with maxjgT \Ut\ < d + 1, there exists a polynomial-time {a + S)- 
approximation algorithm of d-HS. 

Proof. The approximation algorithm for d-HS would use Theorem [5] with k := 

|X|2.[f]. D 

Our theorem also implies the following negative results on approximabil- 
ity. One of them holds if unique games conjecture is true. This conjecture is 
discussed, for example, in [24] and was introduced by Khot in [14|. 

Corollary 2. Min-TCO with raaxt^T \Ut\ < d (d > 3) is NV-hard to approx- 
imate within a factor of {d — 1 — e) , for any e > 0, and, if the unique games 
conjecture holds, there is no polynomial-time [d — e)- approximation algorithm 
for it. 

Proof. Otherwise, the reduction described in the proof of Theorem [2] would 
imply an approximation algorithm for d-HS with a ratio better than d — 1 and 
d respectively. This would directly contradict theorems proven in [11| and [l5| . 

n 



The following corollary is an improvement of the already known results of 
[9| where an 0(log |T|)-approximation algorithm is presented, and of [2| where 
a lower bound of 17 (log (n)) on the approximability is shown. We close the gap 
by designing a reduction that can reduce any problem from class COGAVX to 
Min-TCO preserving the approximation ratio up to a constant. 

Corollary 3. Min-TCO is COGAVX -complete. 

Proof. Min-TCO is in the class COGAVX since it admits a logarithmic approx- 
imation algorithm as presented in 9] . Our reduction from the proof of Theo- 
rem [2] is independent of d and thus an AP-reduction from COGAVX -complete 
Min-HS to Min-TCO. □ 

3.2. A Constant Approximation Algorithm 

In this subsection, we present a reduction from Min-TCO with maxtgT |C^t| !i 
d to 0{d^)-H5 thus showing that there exists a constant approximation algo- 
rithm for Min-TCO with maxjgT \Ut\ < d as d-HS is constantly approximable. 
Moreover, the constant approximation algorithm classifies this problem to be a 
member of the class AVX and thus, since the y^'PA'-hardness was proven in Sub- 
section lOl we conclude that Min-TCO with maxtgT \Ut\ < rf is AVX-comp\ete. 

Recall that a partition of vertices V in graph G is a tuple {A, B), such that 
ACV, B CV, AnB ^9,and AUB ^V. 

Definition 2. Let V = {vi, . . . , Vn} be a set of vertices and for every partition 
{Ai,Bi) of V, let Ei = {{u,v} \ u € Ai A v € A}. Then we call the system 
S = {El, . . . , Ejn} of all sets of edges between vertices of all the partitions of 

V a characteristic system of edges on V. In other words, S contains all sets of 
edges that form a maximum bipartite graph on V. 

In the following lemma, we show the basic properties of characteristic sys- 
tems of edges. 

Lemma 1. Let S — {Ei, . . . ,Em} be a characteristic system of edges on the 
set V of n vertices. Then 

1. m==2"-i - 1. 

2. \E.j\ < ln/2\ ■ \n/2'\, for all j, I < j < m. 

3. Any two sets Ei and Ej differ in at least n — 1 elements (1 < i < j < m). 
A. H C {{u, v} I M, u e V A u ^ v} is a hitting set of ({{m, v} \ u,v ^ V A 

u ^ v},S) if and only if (V, H) is connected. 
5. The size of S is minimal such that partl^holds. 

Proof. Observe that the complementary graph {V,Fj) {Fj = {{u,v} \ u,v G 

V A u ^ v} \ Ej) contains two complete graphs - one on the vertices of Aj and 
other on the vertices of Bj , and it is a maximal graph (in the number of edges) 
that is not connected. We use this observation to prove the last two parts of 
our lemma. 



Part[TJ We count the different partitions {Aj, Bj) of tlie vertices V as each 
such partition determines a different set Ej of edges. There are 2" ways how to 
distribute vertices from V into partitions. We have to subtract 2 possibihties 
for the cases where one of Aj or Bj is empty. Each of the other possibihties is 
counted twice - once when the vertices are present in Aj and once when they 
are present in Bj . 

Part [2] Let the two sets of vertices Aj and Bj of a partition contain fc > 
and n — k vertices. Then the size of -E^ is A; • (n — fc). This function reaches its 
maximum for k = n/2 and thus we can conclude that, for all j, I < j < m, we 
have \Ej\ < [n/2j • {n - [n/2j) = [n/2j • [n/2]. 

Part 131 Let us consider two different partitions {Ai,Bi) and {Aj,Bj) of the 
vertices V. The sets Ai and Aj must differ by at least one vertex. W.l.o.g., 
let the vertex v G Ai and v ^ Aj. Then, due to the transition of the vertex 
V from Ai to Bj, there are |i?i| edges that are in Ei but cannot be in Ej, and 
there are \Ai\ — 1 edges that are not in Ei, but are in Ej. Thus, the overall 
difference in the number of elements between the sets Ei and Ej is at least 
\A,\ + \B,\-l^n-l. 

Part m First, we prove the if case. Suppose that _ff is a hitting set, but 
(V, H) is not connected. Since S contains complements of all maximal sets of 
edges that induce a disconnected graph, there exists j {I < j < m) such that 
H CI Fj. But then, since Ej is complementary to Fj, it follows that EjOH = (I). 
Thus, H cannot be a hitting set as Ej is not hit. 

For the only-if case, suppose that {V, H) is connected, but H is not a hitting 
set of ({{u, v} I u,v gV Au ^ v},S). Then there exists j such that Ej is not 
hit by H and thus H C Fj. Yet in such a case, by our assumption, {V,Fj) is 
not connected and thus {V, H) cannot be connected as well. 

Part [5j Let S' ~ S \ Ej, {{{u,v} \ u,v G V Au y^ ''^Ij'5') be an instance of 
Min-HS. Then we claim that Fj is a hitting set of {{{u, v} \ u,v G V Au ^ v}, 
S'). First, observe that Fj ^ since Ej cannot contain all the edges. Moreover, 
there exists e G Ei (Ei E S') such that e ^ Ej. Then e £ {{u,v} \ u,v £ 
V Au j^ v}\Ej = Fj and thus Fj is a hitting set. However, by the definition of 
Fj, {V, Fj) cannot be connected and thus, the if case of part S] does not hold. □ 

Now we are ready to present a simple one-to-one reduction of Min-TCO with 
maxtgT \Ut\ < d to 0{(P)-HS. The core concept is to construct a system of sets 
that has to be hit in 0{(P)-\-\S as a union over all the topics of the characteristic 
systems of edges on the vertices interested in the topic. 

Theorem 3. There exists a one-to-one reduction of instances o/ Min-TCO with 
maxteT \Ut\ < d to instance of 0{d^)-V\S. 

Proof. Let Itco = {U, T, INT) be an instance of Min-TCO with maxtgr \Ut\ < 
d. For each topic t G T we define St to be the characteristic system of edges on 
vertices in Uf. Note that Lemma [1] holds for each St with n := d. We construct 



an 0{cP)-HS instance Jhs = {^,3) as follows: 

X = {{u, v} \ u,v £ U Au^v} 

teT 

The system contains ('^ ) elements and at most \T\ ■ (2'^^^ — l) sets in S and 
thus has a size polynomial in |/tco|- Obviously, the construction of /hs takes 
time polynomial in |/tco|j too. We now show that a feasible solution of /too 
corresponds to a feasible solution of /hs and vice versa. 

First, consider a feasible solution Solus of /hs and a topic t £ T. Due to 
our construction, the system S contains the characteristic system St on vertices 
Ut- Therefore, by Lemma [T] part 0] and the fact that Solus is a hitting set, we 
know that the graph induced by the edges in Solus on vertices Ut is connected. 

Now, consider a feasible solution SoItco of /tco- By the following argu- 
ment, we can easily see that SoItco hits all the sets in S. Let P G 5 be a set 
that is not hit by SoItco ■ Then there exists t such that P G St and thus a set of 
the characteristic system was not hit and S'oZtco is not a hitting set of St ■ Yet 
in such a case, considering Lemma [T] part HI the subgraph induced on vertices 
Ut by edges from SoItqo cannot be connected and that is in contradiction with 
the definition of Min-TCO with maxtgT \Ut\ < d. D 

Theorem 4. There exists a polynomial-time ([(i/2j • \d/2~\)-approximation al- 
gorithm for Min-TCO with maxtg^ \Ut\ < d. 

Proof. We employ the reduction from Theorem[3]together with the well-known 
d- approximation algorithm for d-HS. Since the size of each set in S is at most 
ld/2\ ■ \d/2'] (Lemma [1] part [2]), by application of this approximation algorithm 
on 0{d^)-\-\S instance {X,S) we obtain a ld/2\ ■ \d/2'] approximate solution of 
our Min-TCO instance with maxtg^ \Ut\ < d. 

Note that our reduction is tight in the size of S as it is minimal (Lemma [T] 
part [S]), thus to achieve an improvement in the approximation algorithm, a 
different method has to be developed. □ 

Corollary 4. Min-TCO with maxteT \Ut\ < 3 inherits the approximation hard- 
ness of Min-VC. 

Corollary 5. Min-TCO with maxt^xlUtl < d is AVX -complete, for arbitrary 
d>3. 

Proof. The ^'PA'-hardness follows from the y^m'-hardness of d-HS (jiH). Due 
to our reduction the problem belongs to the class AVX . □ 

3.3. Min-TCO and Parametrized Complexity Theory 

In this subsection, we shortly summarize the consequences of our reduction 
from Theorem [3] for the field of parametrized complexity, namely we present an 
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exact algorithm and a kernelization for Min-TCO with maxtgT \Ut\ bounded by 
a constant. 

In the research area of exact algorithm design, one searches for an exact 
solution in exponential time. The main goal is to make the base of the expo- 
nentiation as small as possible. 

A kernelization is a process in which an instance is reduced to a smaller 
instance in polynomial time. Then, instead of solving the original instance, it is 
sufficient to solve the problem on the smaller one and then, in polynomial time, 
transform its solution back to the initial instance. 

Problem 2. Mm-d-TCO{k) is the following parametrized problem: 
Input: Instance of Min-TCO with maxter \Ut\ < d and a parameter k. 
Goal: A feasible solution of the Min-TCO instance of size at most k. 



Problem 3. (i-HS(fc) is the following parametrized problem: 

Input: Instance of d-HS and a parameter k. 

Goal: A feasible solution of the d-HS instance of size at most k. 

We first transform the given instance of Min-TCO with maxt^r \Ut\ < d into 
an instance of 0{d^)-HS as in Theorem [3] and then we apply the kernelization 
from [16| to obtain a kernel of Mm-d-TCO{k) or the exact algorithm from 1^ 
to obtain the first nontrivial exact algorithm for solving Mm-d-TCO{k). 

Theorem 5. Min-d-TCO(k) has a kernel of size (2c — 1) • k^^^ + k with c = 
ld/2\-\dm. 

Theorem 6. Min-d-TCO(k) on n vertices can he solved in time 0{c'' + n^) 
with c = [d/2j • [d/2] - 1 + 0(d-2). 

4. Hardness of Min-TCO When the Number of Connections of a User 
is Constant 

It is natural to consider Min-TCO with bounded number of connections per 
user, i. e., to bound maxueu |INT(w)|, since the number of topics in which one 
user is interested in is usually not too large. We show that, sadly, Min-TCO is 
AVX-haid even if max„g[/ |INT(u)| < 6. To show this, we design a reduction 
from minimum vertex cover (Min-VC) to Min-TCO. The minimum vertex cover 
problem is just a different name for d-HS with d — 2. For a better presentation, 
in this section, we refer to Min-VC instead of 2-HS. 

Given is a graph G = {V ,E') and a positive integer k as an instance of Min- 
VC, where the goal is to decide whether the given graph has a solution of size at 
most k. We construct an instance of Min-TCO as follows. Let V = V^^' U V'-^' 
be the set of users, where V^'^^ = {w^^^ \v eV'} and V^"^^ = {w'^) | y ^ y'}. For 
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f(0) ^(1) .„A ^(2) 



each edge e € E', we prepare three topics, te ,te and te . The set of topics is 

JeeE' 



the union of all these topics, i. e., T = UeeB'i^e ' ^e ,te }. The user interest 



function INT is defined as 

INT(.(i))^ U {40),t(i)} 

eeE'[N[u]] 
eeE'lNlu]] 

The following lemma shows the relation between the solutions of the two prob- 
lems. 

Lemma 2. The instance (F,T, INT) o/Min-TCO defined as above has an opti- 
mal solution of cost k + 2\E'\ if and only if the instance {V , E') of Min-VC has 
an optimal solution of cost k. 

Moreover, any feasible solution H of (V, T, INT) can be transformed into a fea- 
sible solution of {V , E') of cost at most \H\ — 2\E'\. 

Proof. It is obvious that any feasible solution H of the instance of Min-TCO 
contains the edge {u^'^\ w*^*^} {i € {1, 2}), for every edge e = {u, v}, because only 
u^^^ and v'-^'> (resp., u^^) a^d w'^^)) are interested in topic ti (resp., ti ). 

Since each feasible solution H of (y, T, INT) must contain the edges {u^^\ 
v^^^ and {u^2)^^(2)|^ fgj. every edge e = {u,v} € E', it is sufficient to consider 

only the topics ti ■ The number of edges in H connecting a vertex from y^^^ 
with a vertex from y'^^) jg at most \H\ — 2\E'\. 

For an edge e = {u,v}, the vertices that are interested in te are u^^'jV^"^', 
w*^^^ and u^^K Since these four vertices have to be connected, H contains at least 
oneedgcof {u(i),u(2)}, {w(i),w(2)}, {u(i),w(2)} and {i;(i), u(2)}. 

The optimal solution of (V, T, INT) contains at most two of these four edges, 
namely the edges {u*-^', u^2)} and {v^^\ w'^2)| Observe that, for each edge / that 
is incident with vertex u in G, the edge {u^^\u^'^'} connects the solution to be 
t , -connected. The only topic that the other two edges connect is t|„ ^i and 
thus they can be replaced by {m(^\m(2)} or {v^^\v^'^^. 

In any non-optimal solution, more than two of the four edges may be present 
and the replacement of edges {u^^^v^"^'} and {u^^-*,u^2)| by {u(i)^u(2)| ^^^^ 
{^(i)^^;(2)|^ respectively, may lead to a decrease of the cost of the solution. 

We assume that the solution of (V, T, INT) has been transformed so that it 
does not contain cross edges between u^^^ and v^^^^^ [i G {1,2}). The vertices 
that correspond to the edges between the two layers V'^^^ and y'^2) form a 
feasible solution of Min-VC. As discussed above, its size is at most \H\ — 2\E'\ 
for a feasible solution H and exactly \H\ — 2\E'\ for an optimal solution H . This 
proves one implication of the first claim and the second claim. 

We show that, if Min-VC has an optimal solution of size k, then the instance 
of Min-TCO has an optimal solution of size k + 2\E'\. From an optimal solution 
W Q V oi Min-VC, we construct the optimal solution of Min-TCO as H = 
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{{u(i),t;(i)},{u(2),v(2)} I {u,v} e £;'}U {{«(!), u(2)} \ u eW}. Clearly, the 
size of H is exactly k + 2\E'\. As M^ is the smallest set of vertices that covers 
all the edges of E', its corresponding edges of Min-TCO produce the minimal 
set of edges that connect every topic with superscript 0. Thus, H satisfies the 
connectivity requirement for every topic t d T and is optimal. □ 

We use the Min-VC on degree-bounded graphs, which is ^T'A'-hard, to show 
lower bounds for our restricted Min-TCO. By the above reduction and the 
lemma, we prove the following theorem. 

Theorem 7. Min-TCO with max.„£[/ |INT(w)| < 6 cannot be approximated within 
a factor of 694/693 in polynomial time, unless V = JVV , even if |INT(u) n 
INT(u)| < 3 holds for every pair of different users u, v £ U . 

Proof. We prove the statement by contradiction. Suppose that there exists 
an approximation algorithm A for Min-TCO with the above stated restrictions 
that has the ratio (1 + 5). 

Let G = {V',E') be an instance of Min-VC and let G be cubic and regular 
(i. e., each vertex is incident with exactly three edges). We construct an instance 
-^TCO of Min-TCO as stated above and we apply our algorithm A to it to obtain a 
feasible solution SoItco- From such a solution, by Lemma[2l we create a feasible 
solution of the original Min-VC instance Solyc- We denote by Opixco and 
Optyc the optimal solutions of /tco and G, respectively. 

Let d be a constant such that d ■ cost{Opt\ic) = 3\V'\. Since G is cubic and 
regular, cost{OptYc) >\E'\/?> =\V'\/2 and thus d<Q. 

Observe that, due to Lemma [21 cosi(OpiTCo) — cost{Optyc) + 2|_E'| = 
cost{OptYc) + d ■ cost{Opt-vc) and cost{SolTco) ^ cost{SolYc) + 2|-E'| = 
cost{Solyc) + d ■ cost{Optyc) ■ These two estimations give us the following 
bound 

cost( Solve) + d ■ costiOptyc) ^ cost(SolTco) ^ -, t- 

< < 1 + 0. 

COStiOptYc) + d ■ cost{Optyc) cost{OptTco) 
The above inequality allows us to bound the ratio of our Min-VC solution 
Solve and the optimal solution Optvc- 

cost{Sohc) <(^i+s)-{d+l)-d = l + S{d + l)<l + 76. 

C0St[(JptYc) 

For 6 := gig, we obtain a ^-approximation algorithm for Min-VC on 3- 
regular graphs which is directly in contradiction with a theorem proven in [8|. 

a 

Corollary 6. Min-TCO with maxy(=u |INT(w)| < 6 is AVX-hard. 

Corollary 7. Min-TCO with |INT(v) n INT(m)| < 3, for all users u,v £U, is 
AVX-hard. 
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This result is almost tight, the case when |INT(i;)nlNT(M)| < 2 is still open. 
The following theorem shows that Min-TCO with |INT(w) n INT(u)| < 1, for 
every pair of distinct users u,v Cz U , can be solved in linear time. 

Theorem 8. Min-TCO can be solved in linear time, if |INT(w) n INT(u)| < 1 
holds for every pair of users u,v (z U , u ^ v. 

Proof. We execute the following simple algorithm. First set the solution E := 
0. Then sequentially, for each topic t, choose its representative v* E U (t G 
INT(i'*)) and add edges {{v*,u} \ u £ Ut\ {v*}} to the solution E. We show 
that, if |INT(u) n INT(m)| < 1, for aU distinct u,v €U, then the solution E is 
optimal. 

Observe that, in our case, any edge in any feasible solution is present because 
of a unique topic. We cannot find an edge e = {u, v} of the solution that belongs 
to the subgraphs for two different topics. (Otherwise |INT(z;)nlNT(u)| > 1 and 
our assumption would be wrong for the two endpoints of the edge e.) Thus, 
any solution consisting of spanning trees for every topic is feasible and optimal. 
Note that its size is \T\ ■ {\U\ - 1). □ 

Corollary 8. Min-TCO with max„gt/ |INT(m)| < 2 can be solved in linear time. 

5. A Polynomial-Time Algorithm for Min-TCO with Bounded Number 
of Topics 

In this section, we present a simple brute-force algorithm that achieves 
a polynomial running time when the number of topics is bounded by |r| < 
\og\og\U\ - I log log log I [/ 1 . 

Theorem 9. The optimal solution of Min-TCO can be computed in polynomial 
time if \T\ < (1 + e{\U\)y^ ■ log log \U\, for a function 

I N ^ 3/2 log log log n 

e\n) > 



log log n — 3/2 log log log n 

Proof. Let (C/,T, INT) be an instance of Min-TCO such that |r| < (1 + 
e(|[/|))~^ ■ loglog|C/|. Moreover, \T\ > 2, otherwise the problem is solvable 
in polynomial time. We shorten the notation by setting t = |T| and n = \U\. 

First observe that, li u,v € U and INT(w) C INT(w), instead of solving 
instance (C/,T, INT), we can solve Min-TCO on instance (C/ \ {■u},r,INT) and 
add to such solution the direct edge {u, v}. Note that u has to be incident with 
at least one edge in any solution. Thus, the addition of the edge {u, v} cannot 
increase the cost. Moreover, any other user that would be connected to u in 
some solution can be also connected to v. Thus, we can remove u, solve the 
smaller instance and then add u by a single edge. Such a solution is feasible 
and its size is unchanged. We say that vertex u is dominated by the vertex v if 
INT(u) C INT(w). 
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Therefore, before applying our simple algorithm, we remove from the in- 
stance all the users that are dominated by some other user. We denote the set 
of remaining users (i. e., those with incomparable sets of interesting topics) by 
M . The largest system of incomparable sets on n elements is called a Sperner 
system and it is a well known fact that its size is at most ( , ^^ i ) • Since every 
user in M must have different set of interesting topics and these sets are all 
incomparable, we have 

(To verify the bound, consider t to be odd or even and use (^^) < .^ , n > 1.) 
Our simple algorithm exhaustively searches over all the possible solutions on 
instance (M , T, INT) and then reconnects each of the removed users U\ M hy 
a single edge. The transformation to set M and the connection of the removed 
users is clearly polynomial. Thus, we only need to show that our exhaustive 
search is polynomial. 

Observe that the size of the optimal solution is at most t{m — 1), as merged 
spanning trees, for all the topics, form a feasible solution. Our algorithm ex- 
haustively searches over all possible solutions, i. e., it tries every possible set of 
i edges for 1 < i < t{m — 1) and verifies the topic-connectivity requirements for 
such sets of edges. The verification of each set can be done in polynomial time. 
The number of sets it checks can be bounded as follows: 



t(m-l) 

E 



(") 



I 





tm 


/ 2\ 


< 






< 


tm ■ 


\tm. 


< 


tm ■ 


m*™ 



< TO*"' • 0(log2 n) 

(Note that tm < m? /2 and thus the binomial coefficient is maximal in tm. Oth- 
erwise the number of all possible choices of edges into a solution is polynomial 
in n.) 

To check a polynomial number of sets, it is sufficient to bound the factor 
TO*™ by a polynomial, i. e., by at most n"^ for some c > 0. (In all our calculations, 
log stands for the binary logarithm, however any other logarithm can be used 
as the change will effect the exponent by a constant.) We consider two cases: 

A: First assume that t < ^f^f^, then to < 2* < (logn)(i+2^("))"\ 

We use the upper bounds on t and m to estimate the number of sets our 
exhaustive search has to check: 

TO*" < (logn)(l+2e(n))-^-loglog„.(log„)(i+^^<"))-' < ^c ^ 
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Then we take the logarithm of the inequahty, leading to 
{l + 2e{n)y^ •loglog^n< c- (log n) 1+2="") . 
After another logarithm operation, we obtain the following inequality: 

2e(n) 
-2 log(l + 2e{n)) + 2 log log log n < • log log n + log c. 

1 + 2e(n) 

We prove inequality ([T]) instead. In the end, we will see that the function 
s{n) is positive, except for the first few values. Thus, for large inputs, 
21og(l + 2e{n)) is positive and thus the above inequalities will hold, too. 



2 log log log n < -— —— - -loglogn (1) 

1 + 2e(n) 

We are now able to estimate the function e{n): 

log log log n 

ein) > . (2j 

log log n — 2 log log log n 

Due to the following case, we use e(n) > -. — , — - — °^,^? °f ", that also 

^ ' ^ ' — log log n— j/z log log log n 

satisfies ([2]) and is positive for n > 16. 
B: To conclude the proof, assume that '°g'°f" < t < '°i'°g" . Since we have 

-^ ' l+2e(n) — l+s{ri) 

both an upper and a lower bound on t, we can refine the estimation of m: 



m<^< (logn)(i+^("»" • (1 + 2£(n))i/2 . (loglogn)^i/". 



We show that tti*™ is polynomial in n similarly as in the previous case: 

(l0gn)(l+^("»"'-('°Slog«)'''-(log»)<'+"<"""'-(l+2£(n))'/' < jjitrn < ^c ^ 

Then we take the logarithm of the inequality, leading to 

{l + e{n))-^ ■ (loglogn)^/2 . (1 + 2e(n))i/2 < ^. (\og n)^T^ . 

Assume that (l + 2e(n))^/^ < l + e(n), except for the first few values, then 
it is sufficient to prove a simpler inequality: 



(1 + s{n)y^ ■ (log log n)^/^ < c • (log n) 1+^"") . 
After another logarithm operation, we obtain the following inequality: 

s(ti} 

- log(l + e(n)) + 3/2 • log log log n < --^ ■ log log n + log c. 

1 + e(n) 
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Again, assuming that log(l + e{n)) > if n tends to infinity, to prove the 
above inequality, it is sufficient to show that 

sin] 
3/2 log log log n < -— — ---loglogn. 
f + e(n) 

Thus we are able to bound the function s{n) as 

3/2 log log log n 



e{n) > 



log log n — 3/2 log log log n 

Observe that e{n) > for n > 16, thus both assumptions that we made 
hold for \U\ > 16 which concludes the proof. 
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6. Conclusion 

In this paper, we have closed the gap in the approximation hardness of 
Min-TCO by showing its COGAVX-completeness. We studied a subproblem of 
Min-TCO where the number of users interested in a common topic is bounded 
by a constant d. We showed that, if d < 2, the restricted Min-TCO is in V and, 
if d > 3, it is ^T'A'-complete. The latter result, together with the constant 
approximation algorithm we presented, allows us to prove lower bounds on 
approximability of these special instances that match any lower bound known 
for any problem from the class AVX . Furthermore, we studied instances of Min- 
TCO where the number of topics in which a single user is interested in is bounded 
by a constant d. We presented a reduction that shows that such instances are 
AVX -hard for d = 6. In this reduction, any two users have at most three 
common topics, thus the reduction shows also that Min-TCO restricted in this 
way is APX-ha,rd. We also investigated Min-TCO with a bounded number of 
topics. Here we presented a polynomial-time algorithm for \T\ < {l + e{\U\))~'^ ■ 
\og\og\U\ and a function e{n) > log log n-T/mgTog log » ■ The case where t = 
w (log log n) and t ~ o(n) remains to be a challenging open problem. 
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